This document contains some overall description of the various declassified satellite imagery datasets that I’ve been playing around with for the last few months. They were downloaded from https://earthexplorer.usgs.gov/.
The dataset consists of 837,088 images taken between 1960 and 1984 by 5 different satellite systems (see Appendix A for more information about the different datasets).
The following plot shows the temporal distribution of photos, broken down by data source.
n_pics_per_year_grouped <- sat %>% mutate(`Year` = as.numeric(substr(`Acquisition Date`,1,4))) %>%
group_by(Year, `Data Source`) %>%
summarise(n_pics = n())
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
n_pics_per_year_grouped %>%
ggplot(aes(x = Year)) +
geom_area(aes(y = n_pics, fill = `Data Source`),
alpha = 0.9) +
scale_fill_manual(values = c("declass1" = "#7eb0d5", "declass2" = "#fd7f6f", "declass3" = "#01A66F")) +
ylab("Number of Pictures")
## Warning: Removed 1 rows containing non-finite values (`stat_align()`).
It’s worth mentioning that the types of images available vary widely both within and across datasets. To take just one dimension of variation, here is the average footprint of photos for each dataset:
sat_sf <- st_as_sf(sat, wkt = "geometry")
sat_sf$area <- st_area(sat_sf)
sat_sf_trimmed <- sat_sf %>%
filter(area != 0)
avgs <- sat_sf_trimmed %>%
st_drop_geometry() %>%
as.data.frame() %>%
group_by(`Data Source`) %>%
summarise(avg_area = mean(area))
# Calculate the average area of the geometries
mean(sat_sf_trimmed$area) # ~3.22, or about 69*69*3.22 = 15330 sq miles
## [1] 3.223478
avgs %>%
ggplot(aes(x = `Data Source`, y = avg_area)) +
geom_col(fill = c("declass1" = "#7eb0d5", "declass2" = "#fd7f6f", "declass3" = "#01A66F")) +
ylab("Average Area (in lat/lng units)")
Many of the frames–especially from the earlier programs in
declass1–have truly massive footprints. The unit is “square
lat/long points,” which is a bit of an imprecise unit, but which I think
corresponds to roughly 69*69 = ~ 4800 square miles. The overall average
size for an image frame in the dataset is thus about 15,000 square
miles.
library(shiny)
# Define UI for the Shiny app
ui <- fluidPage(
actionButton("sampleButton", "Re-sample"),
leafletOutput("map")
)
generate_sample <- function(n) {
return(sample_n(sat_sf_trimmed, n))
}
# Define server logic
server <- function(input, output) {
# Initial random sample
sampled_data <- reactive(generate_sample(5))
# Update the sampled data when the button is clicked
observeEvent(input$sampleButton, {
output$map <- renderLeaflet({
leaflet() %>%
addTiles() %>%
addPolygons(data = generate_sample(5))
})
})
# Render the leaflet map
output$map <- renderLeaflet({
leaflet() %>%
addTiles() %>%
addPolygons(data = generate_sample(5))
})
}
# Run the Shiny app
shinyApp(ui, server)
typical_picture <- sat_sf_trimmed %>%
sample_n(25)
str(typical_picture[1])
## sf [25 × 2] (S3: sf/tbl_df/tbl/data.frame)
## $ Entity ID: chr [1:25] "DS1101-1037DA014" "DS1052-1024DA072" "DS1029-1006DF071" "DS1110-1121DA008" ...
## $ geometry :sfc_POLYGON of length 25; first list element: List of 1
## ..$ : num [1:5, 1:2] 115 119 119 115 115 ...
## ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
## ..- attr(*, "names")= chr "Entity ID"
leaflet() %>%
addTiles() %>%
addPolygons(data = typical_picture)
Here’s a randomly sample of images with roughly that footprint:
typical_picture <- sat_sf_trimmed %>%
filter(area < 3.23, area > 3.21) %>%
sample_frac(0.1)
str(typical_picture[1])
## sf [59 × 2] (S3: sf/tbl_df/tbl/data.frame)
## $ Entity ID: chr [1:59] "DS009043007DA043" "DZB1212-500098L003001" "D3C1211-400692F001" "D3C1203-100065A040" ...
## $ geometry :sfc_POLYGON of length 59; first list element: List of 1
## ..$ : num [1:5, 1:2] 56.4 66.1 66.2 56.9 56.4 ...
## ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
## ..- attr(*, "names")= chr "Entity ID"
leaflet() %>%
addTiles() %>%
addPolygons(data = typical_picture)
# fac_caps <- fac_caps_orig
#
# # Build coverage DF
# coverage <- data.frame(year = integer(),
# num_extant_facilities = integer(),
# num_spotted_facilities = integer(),
# coverage = double())
#
# spotted <- c()
#
# # loop thru years
# for (year in as.integer(format(min_date, "%Y")):as.integer(format(max_date, "%Y"))) {
# year_date <- as.Date(paste(year, "-01-01", sep = ""))
#
# # extant facilities
# fac_exist <- facs %>%
# filter(start_date <= year_date) %>%
# distinct(facility_name) # Get unique facility names
#
# n_fac_exist <- nrow(fac_exist)
#
# # spotted facilities
# fac_spotted <- fac_caps %>%
# filter(`Acquisition Date` <= year_date) %>%
# distinct(facility_name) # Get unique facility names
#
# spotted <- unique(append(spotted,fac_spotted$facility_name))
# n_fac_spotted <- length(spotted)
#
# # remove earlier facilities for next loop
# fac_caps <- fac_caps %>%
# filter(`Acquisition Date` > year_date)
#
# coverage <- bind_rows(coverage, data.frame(year = year,
# num_extant_facilities = n_fac_exist,
# num_spotted_facilities = n_fac_spotted,
# coverage = n_fac_spotted / n_fac_exist))
# }
#
# ## Plot
# ggplot(coverage, aes(x = year)) +
# geom_ribbon(aes(ymin = 0, ymax = num_extant_facilities, fill = "Extant Facilities"), alpha = 0.75) +
# geom_ribbon(aes(ymin = 0, ymax = num_spotted_facilities, fill = "Spotted Facilities"), alpha = 0.75) +
# scale_fill_manual(values = c("Extant Facilities" = "#7eb0d5", "Spotted Facilities" = "#fd7f6f")) +
# labs(y = "Coverage",
# x = "Year",
# fill = "") +
# theme_minimal()
Declassified 1 is the product of a blanket declassification in 1995 and purportedly represents all of the images from the following satellite programs:
CORONA: 1960-1972.
ARGON: 1962-64
LANYARD: 1963.
It’s not clear which images come from which satellites systems.
The dataset contains 837088 images. A handful of different camera setups were used during the program:
unique(sat1$`Camera Resolution`)
## [1] "Vertical Medium" "Stereo Medium" "Vertical Low" "Vertical High"
## [5] "Stereo High"
unique(sat1$`Camera Type`)
## [1] "Vertical" "Aft" "Forward" "Cartographic"
It’s not clear how the resolution of these cameras compares to the later generations, KH-7 and KH-9.
Here is the total “footprint” of the images in this dataset (shapefile supplied by USGS):
Declassified 2 is the product of a 2002 declassification involving the non-comprehensive declassification of imagery from the following programs:
KH-7 (GAMBIT): images taken between 1963 and 1967, the full lifespan of GAMBIT.
KH-9 (HEXAGON): images taken from 1973 to 1980, a subset of the operational period of HEXAGON.
It’s not clear whether all of the images from KH-7 were declassified or whether some were withheld. Only a subset of the KH-9 images were declassified.
The dataset contains 46,699 images.
KH-7 was used for higher-resolution surveillance. KH-9 had both a lower-resolution mapping camera and a higher-resolution surveillance camera, but only the mapping images were declassified in this declassification act:
unique(sat2$`Camera Resolution`)
## [1] "2 to 4 feet" "20 to 30 feet"
unique(sat2$`Camera Type`)
## [1] "KH-7 High Resolution Surveillance" "KH-9 Lower Resolution Mapping"
Here is the total “footprint” of the images in this dataset (shapefile supplied by USGS):
Declassified 3 is the product of a 2011 declassification involving the non-comprehensive declassification of imagery from KH-9 (HEXAGON), which ran from 1971 to 1984. This includes images from the high-resolution surveillance camera, but the website says that “almost all of the imagery from these cameras were declassified in 2011” implying that some images remain classified.
The dataset contains 531,321 images. Note that the website says that “The process to ingest and generate browse imagery for Declass-3 is ongoing,” and suggests that the HEXAGON program generated over 670,000 scenes, indicating that the dataset which we have access to is missing a substantial chunk of the images from HEXAGON.
Both the terrain mapping and surveillance imagery were included in this declassification:
unique(sat3$`Camera Resolution`)
## [1] "2 to 4 feet" "20 to 30 feet"
unique(sat3$`Camera Type`)
## [1] "High Resolution Surveillance Camera - Forward"
## [2] "High Resolution Surveillance Camera - Aft"
## [3] "Lower Resolution Terrain Mapping Camera"
Here is the total “footprint” of the images in this dataset (shapefile supplied by USGS):